Good-Toulmin type estimators for the number of unseen species
نویسندگان
چکیده
Many scientific disciplines use n samples to extrapolate information on the species composition of additional unobservable samples. Given a population with unknown species proportions (pi)i≥1, we consider the problem of estimating: i) the number Uλn,r of unseen species with frequency r ≥ 1 in an additional sample of size λn; ii) the probability Vλn,r of discovering at the (λn + n + 1)-th draw a species with frequency r ≥ 0 in the enlarged sample of size n + λn. The former is related to the problem of controlling how many unseen species are rare, whereas the letter is related to the problem of evaluating the cost-effectiveness of further sampling. We introduce nonparametric empirical Bayes estimators of Uλn,r and Vλn,r, we show that they estimate Uλn,r and Vλn,r all of the way up λ ∝ log2(n)/2r log2(log2(n)), and that this range is the best possible. The proposed estimators are distribution-free, namely no assumptions are imposed on (pi)i≥1. We then consider tuning our estimators for heavy-tailed (pi)i≥1. To do that we impose regular variation as a distribution-specific assumption for the pi’s. Interestingly, the resulting regularly varying estimators are asymptotic equivalent, for large n, to their Bayesian nonparametric counterparts under a Poisson-Dirichlet prior for (pi)i≥1.
منابع مشابه
Information for : Estimating the number of unseen species : A bird in the hand is worth log n in the bush
2 Proofs for the Poisson model 3 2.1 Bounds for general linear estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Negative result for truncated Good-Toulmin estimator . . . . . . . . . . . . . . . . . 5 2.3 Bounds on SGT estimators: arbitrary smoothing . . . . . . . . . . . . . . . . . . . . 5 2.4 Poisson smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
متن کاملOptimal prediction of the number of unseen species.
Estimating the number of unseen species is an important problem in many scientific endeavors. Its most popular formulation, introduced by Fisher et al. [Fisher RA, Corbet AS, Williams CB (1943) J Animal Ecol 12(1):42-58], uses n samples to predict the number U of hitherto unseen species that would be observed if [Formula: see text] new samples were collected. Of considerable interest is the lar...
متن کاملAbundance-based similarity indices and their estimation when there are unseen species in samples.
A wide variety of similarity indices for comparing two assemblages based on species incidence (i.e., presence/absence) data have been proposed in the literature. These indices are generally based on three simple incidence counts: the number of species shared by two assemblages and the number of species unique to each of them. We provide a new probabilistic derivation for any incidence-based ind...
متن کاملA new statistical approach for assessing similarity of species composition with incidence and abundance data
Anne Chao, Robin L. Chazdon, Robert K. Colwell and Tsung-Jen Shen Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA *Correspondence: E-mail: [email protected] Abstract The classic Jaccard and Sørensen indices of compositional similarity (and other indices that depend upon the same v...
متن کاملThe Ratio-type Estimators of Variance with Minimum Average Square Error
The ratio-type estimators have been introduced for estimating the mean and total population, but in recent years based on the ratio methods several estimators for population variance have been proposed. In this paper two families of estimators have been suggested and their approximation mean square error (MSE) have been developed. In addition, the efficiency of these variance estimators are com...
متن کامل